13 research outputs found

    Modulation spectral features for speech emotion recognition using deep neural networks

    Get PDF
    International audienceThis work explores the use of constant-Q transform based modulation spectral features (CQT-MSF) for speech emotion recognition (SER). The human perception and analysis of sound comprise of two important cognitive parts: early auditory analysis and cortex-based processing. The early auditory analysis considers spectrogram-based representation whereas cortex-based analysis includes extraction of temporal modulations from the spectrogram. This temporal modulation representation of spectrogram is called modulation spectral feature (MSF). As the constant-Q transform (CQT) provides higher resolution at emotion salient low-frequency regions of speech, we find that CQTbased spectrogram, together with its temporal modulations, provides a representation enriched with emotion-specific information. We argue that CQT-MSF when used with a 2-dimensional convolutional network can provide a time-shift invariant and deformation insensitive representation for SER. Our results show that CQT-MSF outperforms standard mel-scale based spectrogram and its modulation features on two popular SER databases, Berlin EmoDB and RAVDESS. We also show that our proposed feature outperforms the shift and deformation invariant scattering transform coefficients, hence, showing the importance of joint hand-crafted and self-learned feature extraction instead of reliance on complete hand-crafted features. Finally, we perform Grad-CAM analysis to visually inspect the contribution of constant-Q modulation features over SER

    Deep scattering network for speech emotion recognition

    Get PDF
    International audienceThis paper introduces scattering transform for speech emotion recognition (SER). Scattering transform generates feature representations which remain stable to deformations and shifting in time and frequency without much loss of information. In speech, the emotion cues are spread across time and localised in frequency. The time and frequency invariance characteristic of scattering coefficients provides a representation robust against emotion irrelevant variations e.g., different speakers, language, gender etc. while preserving the variations caused by emotion cues. Hence, such a representation captures the emotion information more efficiently from speech. We perform experiments to compare scattering coefficients with standard melfrequency cepstral coefficients (MFCCs) over different databases. It is observed that frequency scattering performs better than time-domain scattering and MFCCs. We also investigate layerwise scattering coefficients to analyse the importance of time shift and deformation stable scalogram and modulation spectrum coefficients for SER. We observe that layer-wise coefficients taken independently also perform better than MFCCs

    Non-linear frequency warping using constant-Q transformation for speech emotion recognition

    Get PDF
    International audienceIn this work, we explore the constant-Q transform (CQT) for speech emotion recognition (SER). The CQT-based time-frequency analysis provides variable spectro-temporal resolution with higher frequency resolution at lower frequencies. Since lower-frequency regions of speech signal contain more emotion-related information than higher-frequency regions, the increased low-frequency resolution of CQT makes it more promising for SER than standard short-time Fourier transform (STFT). We present a comparative analysis of short-term acoustic features based on STFT and CQT for SER with deep neural network (DNN) as a back-end classifier. We optimize different parameters for both features. The CQT-based features outperform the STFT-based spectral features for SER experiments. Further experiments with cross-corpora evaluation demonstrate that the CQT-based systems provide better generalization with out-of-domain training data

    Analysis of constant-Q filterbank based representations for speech emotion recognition

    Get PDF
    International audienceThis work analyzes the constant-Q filterbank-based time-frequency representations for speech emotion recognition (SER). Constant-Q filterbank provides non-linear spectrotemporal representation with higher frequency resolution at low frequencies. Our investigation reveals how the increased low-frequency resolution benefits SER. The time-domain comparative analysis between short-term mel-frequency spectral coefficients (MFSCs) and constant-Q filterbank-based features, namely constant-Q transform (CQT) and continuous wavelet transform (CWT), reveals that constant-Q representations provide higher time-invariance at low-frequencies. This provides increased robustness against emotion irrelevant temporal variations in pitch, especially for low-arousal emotions. The corresponding frequency-domain analysis over different emotion classes shows better resolution of pitch harmonics in constant-Q-based time-frequency representations than MFSC. These advantages of constant-Q representations are further consolidated by SER performance in the extensive evaluation of features over four publicly available databases with six advanced deep neural network architectures as the back-end classifiers. Our inferences in this study hint toward the suitability and potentiality of constant-Q features for SER

    Towards Smart and Cost-effective Bridge Infrastructure Monitoring Systems

    No full text
    Bridge health monitoring (BHM) has recently gained significant interest worldwide in the inspection and maintenance of aging bridge infrastructure in the era of climate change and adverse weather conditions. However, extensive datasets resulting from these monitoring systems require appropriate tools to diagnose the data systematically under various operating conditions of bridges, leading to expensive and time intensive BHM strategies. To mitigate this challenge, a smart and cost-effective bridge infrastructure management system is of paramount need in today’s world. This thesis aims to develop a suite of cost-effective bridge management strategies by employing limited and mobile sensing technology and addressing their inherent challenges in real-world situations. First, a limited sensor-based cost-effective approach is developed to analyze the traffic-induced nonstationary vibration response of the bridge. The proposed technique can deal with practical challenges of direct BHM, such as traffic interruptions, bridge closures, limited space, and the limited number of sensors, thereby eliminating the need for high labor and equipment costs. Secondly, the visualization of BHM data is explored for systematic diagnosis of the bridge data. A visualization tool based on Bridge Information Modeling (BrIM) is proposed which is suitable for real-time system identification of bridges. The objective of the proposed tool is to take one step forward from static to dynamic BrIM by representing and visualizing real-time BHM data. Contact-based BHM usually involves direct instrumentation with sensors to extract the modal parameters from the ambient or forced vibrations. As an alternative to direct BHM, indirect BHM (iBHM) has emerged as a promising avenue for effective and inexpensive monitoring of bridge infrastructure. However, the existing iBHM methods face challenges associated with the accurate identification of bridge properties under various driving and vehicle conditions. In this thesis, a hybrid time-frequency method is proposed for decoupling vehicle bridge interactions and performing robust bridge modal identification under various operational challenges. The method is capable of bridge condition assessment using vehicle response from a passing vehicle traveling over a bridge, resulting in a smart drive-by BHM technology. The vehicle response in iBHM is often criticized as the presence of vehicle frequency can make vehicle scanning ineffective. Therefore, this thesis also explores the robust contact point (CP)-based BHM method, which is free from vehicle conditions and provides more accurate estimates of bridge frequencies

    Novel dietary lipid-based self-nanoemulsifying drug delivery systems of paclitaxel with p-gp inhibitor: implications on cytotoxicity and biopharmaceutical performance

    No full text
    <div><p><b><i>Objectives:</i></b> This work describes the development and characterization of novel self-nanoemulsifying drug delivery systems (SNEDDS) employing polyunsaturated fatty acids for enhancing the oral bioavailability and anticancer activity of paclitaxel (PTX) by coadministration with curcumin (Cu).</p><p><b><i>Methods:</i></b> Preformulation studies endorsed sesame oil, labrasol, and sodium deoxycholate as lipid surfactants and cosurfactants based on their solubility for the drugs and spontaneity of emulsification to produce nanoemulsions. Further, phase titration studies were performed to identify a suitable nanoemulsion region for preparing the SNEDDS formulation.</p><p><b><i>Results:</i></b> The prepared formulations were characterized through <i>in vitro,</i><i>in situ,</i> and <i>in vivo</i> studies to evaluate the biopharmaceutical performance. <i>In vitro</i> drug release studies showed 2.8- to 3.4-fold enhancement in the dissolution rate of both drugs from SNEDDS as compared with the pure drug suspension. Cell line studies revealed 1.5- to 2.7-fold reduction in the cytotoxicity on MCF-7 cells by plain PTX-SNEDDS and PTX-Cu-SNEDDS vis-à-vis the PTX-suspension. <i>In situ</i> intestinal perfusion studies revealed significant augmentation in permeability and absorption parameters of drug from PTX-Cu-SNEDDS over the plain PTX-SNEDDS and PTX-suspension (<i>p</i> < 0.001). <i>In vivo</i> pharmacokinetic studies also showed a remarkable improvement (i.e., 5.8- to 6.3-fold) in the oral bioavailability (<i>C</i><sub>max</sub> and AUC) of the drug from PTX-SNEDDS and PTX-Cu-SNEDDS vis-à-vis the PTX-suspension.</p><p><b><i>Conclusions:</i></b> Overall, the studies corroborated superior biopharmaceutical performance of PTX-Cu-SNEDDS.</p></div

    ROGAVF STUDY 2019 - Relationship of HbA1C (GLYCEMIC Control) on outcomes of AV FISTULAS: A prospective observational study

    No full text
    Objective: The main aim of the study was to compare outcomes based on diabetic control for patients undergoing formation of a new upper limb arteriovenous fistula (AVF). Research design and methods: A prospective cohort study was performed where we obtained baseline HbA1c in 65 patients before undergoing AV fistula formation. Patients were followed up at our clinic 6 weeks after creation to assess fistula maturity. Results: Multiple logistic regression was used to analyze the association between HbA1c status and maturity of AVF at 6 weeks after controlling for possible confounding factors such as age, sex, presence of hypertension and dyslipidaemia. Those with HbA1c less than 6.5 were 22 times likely to have maturity of AVF at 6 weeks as compared to those with HbA1c 6.5 or more (AOR = 22.65, p &lt; 0.005) Conclusion: Good diabetes control, reflected by an HbA1c of less than 6.5, is associated with a very high possibility of AVF maturity at 6 weeks post creation
    corecore